** Data reading and variable selection from raw data
** Chinese Housing Survey 1993

** 01. Reading data **

cap log close
clear all
set more off
cd /*insert you work directory here*/
use /*read your data here*/   


** 02. Consructing year and country variables **

ge year=1993
lab var year "survey year"

ge country=156
lab var country "ISO country code"
//China: 156 (ISO Country Codes) 


** 03. ID variables **

ge pid=v0105
lab var pid "person id"


** 04. Basic Demographics (Sex and Age/birth year) **

ge sex=v0111
lab var sex "sex"
lab def sex 1 "male" 2 "female"
lab val sex sex

ge age=v0112
lab var age "age"

ge birthyr=1998-v0112
lab var birthyr "year of birth"


** 05. Siblings **
ge nbro=v0120
lab var nbro "number of brothers"

ge nsis=v0121
lab var nsis "number of sisters"

ge nsibs=nbro+nsis
lab var nsibs "number of siblings"
//birth order of the respondent not available


** 06. Own education **

rename v0116 educ
lab var educ "respondent education"

** 07. Parents' education: Father and/or Mother **

rename v0703 faeduc
lab var faeduc "father's education"
rename v0722 moeduc
lab var moeduc "mother's education"


** 08. Own occupation **

//occupation census code
rename v0606 occ
lab var occ "present occupation census code"
lab def occ 998 "don't know" 999 "no answer"

rename v0644 firstocc
lab var firstocc "first occupation census code"


** 09. Parents' occupation **

rename v0706 faocc
lab var faocc "father occupation census code"

rename v0725 moocc
lab var moocc "mother occupation census code"
lab val occ firstocc faocc moocc occ

** 10. Tabulate the Identified Variables **

log using /*insert you work directory here*/, replace text


** Data reading and variable selection from raw data
** China Housing Survey 1993 

** Sex **
tab sex

** Age, Birth Year **
sum age birthyr, d

** Siblings **
sum nsibs nbro nsis, d

** R's Own Education **
tab1 educ

** Parental Education **
tab1 faeduc moeduc

** R's Own Occupation **
tab1 occ firstocc

** Parental Occupation **
tab1 faocc moocc

log close

** 11. Keep the identified variables only

keep year country pid sex age birthyr ///
	 nsibs nbro nsis ///
	 educ faeduc moeduc ///
	 occ firstocc faocc moocc


** 12. Save the Data File **

saveold /*insert you work directory here*/, replace



** 13. Homoginising education**
** Own Education **
rename educ educ_cat

ge educ_yrs=0 if educ_cat==1
replace educ_yrs=6 if educ_cat==2
replace educ_yrs=9 if educ_cat==3
replace educ_yrs=12 if educ_cat==4
replace educ_yrs=11 if educ_cat==5
replace educ_yrs=12 if educ_cat==6
replace educ_yrs=15 if educ_cat==7
replace educ_yrs=16 if educ_cat==8
replace educ_yrs=19 if educ_cat==9
lab var educ_yrs "respondent highest education in years"

ge educ_ISCED=020 if educ_cat==1
replace educ_ISCED=100 if educ_cat==2
replace educ_ISCED=244 if educ_cat==3
replace educ_ISCED=344 if educ_cat==4
replace educ_ISCED=354 if educ_cat==5
replace educ_ISCED=354 if educ_cat==6
replace educ_ISCED=554 if educ_cat==7
replace educ_ISCED=667 if educ_cat==8
replace educ_ISCED=767 if educ_cat==9
lab var educ_ISCED "respondent highest education in ISCED code"

** Parents Education **

ge faeduc_flag=1 

rename faeduc faeduc_cat
rename moeduc maeduc_cat

ge faeduc_yrs=0 if faeduc_cat==1
replace faeduc_yrs=6 if faeduc_cat==2
replace faeduc_yrs=9 if faeduc_cat==3
replace faeduc_yrs=12 if faeduc_cat==4
replace faeduc_yrs=11 if faeduc_cat==5
replace faeduc_yrs=12 if faeduc_cat==6
replace faeduc_yrs=15 if faeduc_cat==7
replace faeduc_yrs=16 if faeduc_cat==8
lab var faeduc_yrs "father's education in years"

ge faeduc_ISCED=020 if faeduc_cat==1
replace faeduc_ISCED=100 if faeduc_cat==2
replace faeduc_ISCED=244 if faeduc_cat==3
replace faeduc_ISCED=344 if faeduc_cat==4
replace faeduc_ISCED=354 if faeduc_cat==5
replace faeduc_ISCED=354 if faeduc_cat==6
replace faeduc_ISCED=554 if faeduc_cat==7
replace faeduc_ISCED=667 if faeduc_cat==8
replace faeduc_ISCED=767 if faeduc_cat==9
lab var faeduc_ISCED "father highest education in ISCED code"

ge maeduc_yrs=0 if maeduc_cat==1
replace maeduc_yrs=6 if maeduc_cat==2
replace maeduc_yrs=9 if maeduc_cat==3
replace maeduc_yrs=12 if maeduc_cat==4
replace maeduc_yrs=11 if maeduc_cat==5
replace maeduc_yrs=12 if maeduc_cat==6
replace maeduc_yrs=15 if maeduc_cat==7
replace maeduc_yrs=16 if maeduc_cat==8
lab var maeduc_yrs "mother's education in years"

ge maeduc_ISCED=020 if maeduc_cat==1
replace maeduc_ISCED=100 if maeduc_cat==2
replace maeduc_ISCED=244 if maeduc_cat==3
replace maeduc_ISCED=344 if maeduc_cat==4
replace maeduc_ISCED=354 if maeduc_cat==5
replace maeduc_ISCED=354 if maeduc_cat==6
replace maeduc_ISCED=554 if maeduc_cat==7
replace maeduc_ISCED=667 if maeduc_cat==8
replace maeduc_ISCED=767 if maeduc_cat==9
lab var maeduc_ISCED "mother highest education in ISCED code"


** 14. Homoginising sibling **
//cutoff
ge nbro_flag=99
lab var nbro_flag "cutoff of number of brothers"
ge nsis_flag=99
lab var nsis_flag "cutoff of number of sisters"
ge nsibs_flag=99
lab var nsibs_flag "cutoff of total number of siblings"

lab def nsib_flag 99 "no cutoff"
lab val nbro_flag nsis_flag nsibs_flag nsib_flag


** 15. Tab Education and Sibling Variables **
tab1 sex age birthyr
tab1 educ_cat educ_yrs faeduc_cat faeduc_yrs maeduc_cat maeduc_yrs faeduc_flag 
tab1 nbro nsis nsibs nbro_flag nsis_flag nsibs_flag


** 16. Save the Data File **

saveold /*insert you work directory here*/, replace

